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Interactive Correlation Of Compound Information 
And Genomic Information 



Cross-Reference to Related Application 
5 This application is related to provisional patent application serial no. 60/240,1 18 

filed October 12, 2000, from which priority is claimed under 35 USC § 1 19(e)(1) and 
which is incorporated herein by reference in its entirety. 



Field of the invention 

10 This invention relates to methods and products for identifying pharmaceutical 

leads, correlating information regarding gene expression, biological assays and other 
relevant information, and facilitating the purchase of related products. 



Background of the Invention 

15 Genomic sequence information is now available for several organisms, and addi- 

tional data is added continuously. However, only a small fraction of the open reading 
frames now sequenced correspond to genes of known function: the function of most 
polynucleotide sequences, and any encoded proteins, is still unknown. These genes are 
now studied by means of, inter aha, polynucleotide arrays, which quantify the amount of 

20 mRNA produced by a test cell (or organism) under specific conditions. "Chemical gen- 
omic aimotation" is the process of determining the transcriptional and bioassay response 
of one or more genes to exposure to a particular chemical, and defining and interpreting 
such genes in terms of the classes of chemicals for which they interact. A comprehensive 
library of chemical genomic annotations would enable one to design and optimize new 

25 pharmaceutical lead compounds based on the probable transcriptional and biomolecular 
profile of a hypothetical compound with certain characteristics. Additionally, one can 
use chemical genomic annotations to determine relationships between genes (for 
example, as members of a signal pathway or protein-protein interaction pair), and aid in 
determining the causes of side effects and the like. Finally, presenting the drug design 

30 researcher with a body of chemical genomic annotation information will generate 
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research hypotheses that will stimulate follow-on experimental design, and therefor 
enable and stimulate purchase of related products to execute such experiments. 

Sabatini et al, US 5,966,712 disclosed a database and system for storing, compar- 
ing and analyzing genomic data. 

5 Maslyn et al, US 5,953,727 disclosed a relational database for storing genomic 

data. 

Kohler et al., US 5,523,208 disclosed a database and method for comparing poly- 
nucleotide sequences and the predicted functions of their encoded proteins. 

Fujiyama et al., US 5,706,498 disclosed database and retrieval system, for identi- 
1 0 tying genes of similar sequence. 

Summary of the Invention 

We have now invented a system and method for analyzing and exploring the data 
resulting from chemical genomic annotation experiments, and for facilitating the design 

15 by a user of further experiments related to the user's goals, and thereby encouraging the 
purchase by the user of products related to the data and additional experiments. 

One aspect of the invention is a method for evaluating a test compound for bio- 
logical activity, comprising: providing a database comprising a plurality of reference 
gene expression profiles, each profile comprising a representation of the expression level 

20 of a plurality of genes in a test cell exposed to a reference compound and a representation 
of the reference compound; providing a test gene expression profile, comprising a repre- 
sentation of the expression level of a plurality of genes in a test cell exposed to said test 
compound; comparing said test gene expression profile with said first gene expression 
profiles; identifying at least one first gene expression profile that is similar to said test 

25 gene expression profile; displaying said selected expression profile, and displaying prod- 
uct information related to said selected expression profile. 

Another aspect of the invention is a system for performing the method of the 
invention. 

Another aspect of the invention is a computer-readable medium having encoded 
30 thereon a set of instructions enabling a computer system to perform the method of the 
invention. 
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Brief Description of the Figures 

Fig. 1 is a diagram of an embodiment of a system of the invention. 

Fig. 2 is a flow diagram illustrating an embodiment of a method of the invention. 

5 

Detailed Description 

Definitions : 

The term "test compound" refers in general to a compound to which a test cell is 
exposed, about which one desires to collect data. Typical test compounds will be small 

1 0 organic molecules, typically prospective pharmaceutical lead compounds, but can include 
proteins, peptides, polynucleotides, heterologous genes (in expression systems), plas- 
mids, polynucleotide analogs, peptide analogs, lipids, carbohydrates, viruses, phage, 
parasites, and the like. 

The term "biological activity" as used herein refers to the ability of a test com- 

1 5 pound to alter the expression of one or more genes. 

The term "test cell" refers to a biological system or a model of a biological system 
capable of reacting to the presence of a test compound, typically a eukaryotic cell or 
tissue sample, or a prokaryotic organism. 

The term "gene expression profile" refers to a representation of the expression 

20 level of a plurality of genes in response to a selected expression condition (for example, 
incubation in the presence of a standard compound or test compound). Gene expression 
profiles can be expressed in terms of an absolute quantity of mRNA transcribed for each 
gene, as a ratio of mRNA transcribed in a test cell as compared with a control cell, and 
the like. As used herein, a "standard" gene expression profile refers to a profile already 

25 present in the primary database (for example, a profile obtained by incubation of a test 
cell with a standard compound, such as a drug of known activity), while a "test" gene 
expression profile refers to a profile generated under the conditions being investigated. 
The term "modulated" refers to an alteration in the expression level (induction or repres- 
sion) to a measurable or detectable degree, as compared to a pre-estabUshed standard (for 

30 ex^ple, the expression level of a selected tissue or cell type at a selected phase under 
selected conditions). 
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The term "correlation information" as used herein refers to information related to 
a set of results. For example, correlation information for a profile result can comprise a 
list of similar profiles (profiles in which a pluraUty of the same genes are modulated to a 
similar degree, or in which related genes are modulated to a similar degree), a list of 
5 compounds that produce similar profiles, a list of the genes modulated in said profile, a 
Ust of the diseases and/or disorders in which a plurality of the same genes are modulated 
in a similar fashion, and the like. Correlation information for a compoxmd-based inquiry 
can comprise a list of compounds having similar physical and chemical properties, com- 
pounds having similar shapes, compounds having similar biological activities, com- 

10 pounds that produce similar expression array profiles, and the like. Correlation informa- 
tion for a gene- or protein-based inquiry can comprise a list of genes or proteins having 
sequence similarity (at either nucleotide or amino acid level), genes or proteins having 
similar known functions or activities, genes or proteins subject to modulation or control 
by the same compounds, genes or proteins that belong to the same metabohc or signal 

15 pathway, genes or proteins belonging to similar metabolic or signal pathways, and the 
like. In general, correlation information is presented to assist a user in drawing parallels 
between diverse sets of data, enabling the user to create new hypotheses regarding gene 
and/or protein function, compound utility, and the Hke. Product correlation information 
assists the user with locating products that enable the user to test such hypotheses, and 

20 facilitates their purchase by the user. 

A "hypothesis" as used herein refers to a testable idea, inspired in by correlation 
information, regarding an explanation or model of gene or protein function, biochemical 
or biological function, drug or compound activity or toxicity, absorption, metaboUsm, 
distribution, excretion, and the like. Typical hypotheses herein include, without limita- 

25 tion, the identification of a compound or class of compounds as potential lead compounds 
or drugs, identification of genes or proteins that are characteristic of a disease state or 
adverse reaction, identification of genes and/or proteins that interact, and the like. 

"Similar", as used herein, refers to a degree of difference between two quantities 
that is within a preselected threshold. For example, two genes can be considered "sim- 

30 ilar" if they exhibit sequence identity of more than a given threshold, such as for example 
20%. A number of methods and systems for evaluating the degree of similarity of poly- 
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nucleotide sequences are publicly available, for example BLAST, FASTA, and the like. 
See also Maslyn et al and Fujimiya et al., supra, incorporated herein by reference. The 
similarity of two profiles can be defined in a number of different ways, for example in 
terms of the number of identical genes affected, the degree to which each gene is 
5 affected, and the like. Several different measures of similarity, or methods of scoring 
similarity, can be made available to the user: for example, one measure of similarity con- 
siders each gene that is induced (or repressed) past a threshold level, and increases the 
score for each gene in which both profiles indicate induction (or repression) of that gene. 
For example, if gx is gene "x", and Pex is the expression level of gx in an experimental 

10 profile, psx is the expression level of gx in a standard profiles, and px is a predetermined 
threshold level, we can define fimction H for any experimental ("E") and standard ("S") 
profile pair as He,s = 1 when both pex and psx > pr, and He,s = 0 when either pex or psx < 
Pt. Then, a simple similarity score can be defined as N = Sx Hx. This similarity score 
coimts only the genes that are similarly induced in both profiles. A more informative 

15 score can be calculated as N' = Sx (Hx)*| Pex - psxl * ( Pex * Psx)'^^^ , which also takes into 
consideration the difference in expression level between the experimental and standard 
profiles, for each gene induced above the threshold level. Other statistical methods are 
also apphcable. 

The term "product information" as used herein refers to information regarding the 
20 availability, characteristics, price, and the like, of a product. Product information can 

consist of a hyperhnk to such information. A product "related to data" refers to a product 
usefiil for the fiirther exploration of the gene, protein, system, and/or compound to which 
the data pertains, or to relationships between the gene, protein, system, and/or compound 
highHghted in the correlation information. Exemplary products include, for example, 
25 bioassay kits and reagents, compounds useful as positive and negative controls, kits for 
purifying proteins or other biological products, antibodies for determining and/or isolat- 
ing substances, compounds similar to the test compound useful for further study, addi- 
tional data regarding gene or protein function and/or relationships (for example, sequence 
data from other species, information regarding metaboHc and/or signal pathways to which 
30 the gene or protein belong, and the like), DNA microarrays useful for determining 
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expression of the gene and/or related genes, information and analysis regarding features 
of a compound that are Ukely to be responsible for the observed activity, and the like. 

The term "hyperlink" as used herein refers to feature of a displayed image or text 
that provides information additional and/or related to the information already currently 
5 displayed when activated, for example by clicking on the hyperlink. An HTML HREF is 
an example of a hyperlink within the scope of this invention. For example, when a user 
queries the database of the invention and obtains an output such as a Ust of the genes 
most induced or repressed by a selected compoimd, one or more of the genes hsted in the 
output can be hyperlinked to related information. The related information can be, for 
1 0 example, additional information regarding the gene, a hst of compounds that affect gene 
induction in a similar way, a hst of genes having a known related function, a Ust of bio- 
assays for determining activity of the gene product, product information regarding such 
related information, and the like. 

15 General Method : 

The system of the invention provides a correlative database that permits one to 
study relationships between different genes, between genes and a variety of compounds, 
to investigate structure- function relationships between different compounds, and to facil- 
itate the purchase of products based on such observed relationships. The database con- 

20 tains a plurality of standard gene expression profiles, which comprise the expression level 
of a plurality of genes under a plurality of specified conditions. The conditions specified 
can include expression within a particular cell type (for example, fibroblast, lymphocyte, 
neuron, oocyte, hepatocyte, and the like), expression at a particular point in the cell cycle 
(e.g., Gl), expression in a specified disease state, the presence of environmental factors 

25 (for example, temperature, pressure, CO2 partial pressure, osmotic pressure, shear stress, 
confluency, adherence, and the hke), the presence of pathogenic organisms (for example, 
viruses, bacterial, fungi, and extra- or intracellular parasites), expression in the presence 
of heterologous genes, expression in the presence of test compounds, and the like, and 
combinations thereof The database can contain expression profiles for a plurality of dif- 

30 ferent species, for example, human, mouse, rat, chimpanzee, yeast such as Saccharo- 
myces cerevisiae, bacteria such as Kcoli, and the like. The database preferably com- 
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prises expression profiles for at least 10 different genes from a particular organism, more 
preferably in excess of 500 genes, and can include a substantial fraction of the genes 
expressed by an organism, such as, for example, about 50%, about 75%, about 90%, or 
essentially 100%, The standard expression profiles are preferably annotated, for 
5 example, with information regarding the conditions under which the profile was obtained. 
Preferably, the database also contains annotations for one or more genes, more preferably 
for each gene represented in the database. The aimotations can include any available 
information about the gene, such as, for example, the gene's names and synonyms, the 
gene's nucleotide sequence the amino acid sequence encoded, any known biological 

10 activity or function, any genes of similar sequence, any metabohc or protein interaction 
pathways to which it is known to belong, a Usting of assays capable of determining the 
activity of its protein product, and the like. 

The database contains interpretive gene expression profiles and bioassay profiles 
for a plurality of different compounds that comprise a representation of a compoimd's 

15 mode of action and/or toxicity ("drug signatures") , and can include experimental com- 
pounds and/or "standard" compounds. Drug signatures provide a unique picture of a 
compound's comprehensive activity in vivo, including both its effect on gene transcrip- 
tion and its interaction with proteins. Standard compounds are preferably well-character- 
ized, and preferably exhibit a known biological effect on host cells and/or organisms. 

20 Standard compounds can advantageously be selected from the class of available drug 

compounds, natural toxins and venoms, known poisons, vitamins and nutrients, metabolic 
byproducts, and the hke. The standard compounds can be selected to provide, as a set, a 
wide range of different gene expression profiles. The records for the standard com- 
pounds are preferably annotated with information available regarding the compounds, 

25 such as, for example, the compound name, structure and chemical formula, molecular 
weight, aqueous solubility, pH, lipophihcity, known biological activity, source, proteins 
and/or genes it is known to interact with, assays for detecting and/or confirming activity 
of the compound or related compounds, and the like. Altematively, one can employ a 
database constructed from random compounds, combinatorial libraries, and the like. 

30 The database further contains bioassay data derived from experiments in which 

one or more compounds represented in the database are examined for activity against one 
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or more proteins represented in the database. Bioassay data can be obtained from open 
literature and directly by experiment. 

Further, the database preferably contains product data related to the compounds, 
genes, proteins, expression profiles, and/or bioassay data otherwise present in the data- 
5 base. The product data can be information regarding physical products, such as bioassay 
kits and reagents, compounds useful as positive and negative controls, compounds similar 
to the test compound useful for further study, DNA microarrays and the like, or can com- 
prise information-based products, such as additional data regarding gene or protein func- 
tion and/or relationships (for example, sequence data from other species, information 

10 regarding metabolic and/or signal pathways to which the gene or protein belong, and the 
like), algorithmic analysis of the compounds to determine critical features and likely 
cross-reactivity, and the like. The product information can take the form of data or infor- 
mation physically present in the database, hyperlinks to extemal information sources 
(such as a vendor's catalog, for example, suppUed via the Intemet or CD-ROM), and the 

15 like. 

The database thus preferably contains five main types of data: gene information, 
compound information, bioassay information, product information, and profile informa- 
tion. Gene information comprises information specific to each included gene, and can 
include, for example, the identity and sequence of the gene, one or more unique identi- 

20 fiers linked to public and/or commercial databases, its location on a standard array plate, 
a list of genes having similar sequences, any known disease associations, any known 
compounds that modulate the encoded protein activity, conditions that modulate expres- 
sion of the gene or modulate the protein activity, and the like. Product information com- 
prises information specific to the available products, and varies depending on the exact 

25 nature of the product, and can include information such as price, manufacturer, contents, 
warranty information, availability, delivery time, distributor, and the like. Bioassay 
information comprises information specific to particular compounds (where available), 
and can include, for example, results from high-throughput screening assays, cellular 
assays, animal and/or human studies, biochemical assays (including binding assays and 

30 enzymatic assays) and the like. Compound information comprises information specific to 
each included compound, such as, for example, the chemical name(s) and structure of the 
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compound, its molecular weight, solubility and other physical properties, proteins that it 
is known to interact with, the profiles in which it appears, the genes that are affected by 
its presence, and available assays for its activity. Profile information includes, for 
example, the conditions under which it was generated (including, for example, the cell 
5 type(s) used, the species used, temperature and culture conditions, compounds present, 
time elapsed, and the like), the genes modulated with reference to a standard, a Ust of 
similar profiles, and the like. The information is obtained by assimilation of and/or refer- 
ence to currently-available databases, and by collecting experimental data. It should be 
noted that the gene database, although large, contains a finite number of records, limited 
10 by the nxmiber of genes in the organisms under study. The compound database is poten- 
* tially unlimited, as new compounds are made and tested constantly. The profile database, 

\| however, is still larger, as it represents information regarding the interaction of a very 

Q large number of genes with a potentially infinite number of different compounds, under a 

variety of conditions. 

5 1 5 Experimental data is preferably collected using a high-throughput assay format, 

g capable of examining, for example, the effects of a phirality of compounds (preferably a 

p large nimiber of standard compoimds, for example 10,000) when administered individ- 

gi ually or as a mixture to a plurality of different cell types. Assay data collected using a 

uniform format are more readily comparable, and provide a more accurate indication of 
20 the differences between, for example, the activity of similar compounds, or the differ- 
ences in sensitivity of similar genes. 

The system provides several different ways to access the information contained 
within the database. An operator can enter a test gene expression profile into the system, 
cause the system to compare the test profile with stored standard gene expression profiles 
25 in the database, and obtain an output comprising one or more standard expression profiles 
that are similar to the test profile. The standard expression profiles are preferably accom- 
panied by annotations, for example providing information to the operator as to the sim- 
ilarity of the test profile to standard profiles obtained fi-om disease states and/or standard 
compoimds. The test gene expression profile preferably includes an indication of the con- 
30 ditions under which the profile is obtained, for example a representation of a test com- 
pound used, and/or the culture conditions. 
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The output preferably further comprises a Ust of the genes that are modulated (up- 
regulated or down-regulated) in the test gene expression profile, as compared with a pre- 
established expression value, a pre-selected standard expression profile, a second test 
gene expression profile, or another pre-set threshold value. 
5 The output is preferably hyperlinked, so that the operator can easily switch from, 

for example, a listing of the similar standard expression profiles to a listing of the mod- 
ulated genes in a selected standard expression profile, or from a gene Usted in the test 
profile to a list of the standard expression profiles in which the gene is similarly mod- 
ulated, or to a list of the standard compounds (and/or conditions) which appear to mod- 
10 ulate the selected gene. The output can comprise correlation information that highhghts 
J. featxH-es in common between different genes, targets, profiles, compounds, assays, and 

J^f the like, to assist the user in drawing useful correlations. For example, the output can 

y contain a list of genes that were modulated in the user's experiment with a selected 

^ compound: if a plurality of the genes are indicated as associated with liver toxicity, the 

* 1 5 system can prompt the user that the compound is associated with a toxic drug signature, 
and prompt the user to continue with the next compound. Conversely, the output could 
r[ indicate previously xmnoticed associations between different pathways, leading the user 

H to explore a hitherto unknown connection. The output preferably includes hyperhnks to 

y. product information, encouraging the user to purchase or order one or more products 

20 from a selected vendor, where the product(s) relate specifically to the focus of the data- 
base inquiry and the correlation information that results, and is presented back to the user 
to facilitate hypothesis generation. For example, the output can provide links to products 
useful for confirming the apparent activity of a compound, for measuring biological 
activity directly, for assaying the compound for possible side effects, and the like, 
25 prompting the user to select products useful in the next stage of experimentation. 

The system is preferably provided with an algorithm for assessing similarity of 
compounds. Suitable methods for comparing compounds and determining their morpho- 
logical similarity include "SD-MI", as set forth in copending application USSN 
09/475413, incorporated herein by reference in full, Tanimoto similarity (Daylight Soft- 
30 ware), and the like. Preferably, the system can be queried for any compounds that are 

similar to the test compound in structure and/or morphology. The output from this query 
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preferably includes the corresponding standard expression profiles (or hyperlinks to the 
corresponding standard expression profiles), and preferably further includes a hsting, 
description, or hyperlink to an assay capable of determining the biological activity of the 
standard and/or test compound. 
5 Thus, for example, if the user inputs an experimental expression profile resulting 

from incubation of test cells with a particular experimental compound, the user can obtain 
an output comprising an estimate of the quality of the data, an identification of the genes 
affected by the compound, a listing of similar profiles and the conditions under which 
they were obtained (for example, the compounds used), and a hst of compounds having a 

1 0 structural similarity. The output can be provided in a hyperlinked format that permits the 
user to then investigate and explore the data. For example, the user can examine which 
genes are modulated, and determine whether or not the genes have yet been characterized 
as to function or activity, and under what conditions each gene is modulated in a similar 
fashion. Alternatively, the user can compare the profile obtained with the profile of a 

15 desired outcome, for example comparing the profile obtained by incubation of diseased 
or infected tissue with a test compound against a profile obtained from healthy (unper- 
turbed) tissue. Alternatively, the user can compare the profile with the profiles obtained 
using standard compounds, for example using a drug of known activity, mechanism of 
action, and specificity, thus determining whether the test compound operates by a differ- 

20 ent mechanism, or if by the same mechanism whether it is more or less active than the 
standard. Additionally, the user can compare the structure of the test compound with the 
structures of other compounds with similar profiles (to determine which structural fea- 
tures of the compounds are common, and thus likely to be important for activity), or can 
compare the compound's profile with the profiles obtained from structurally similar com- 

25 pounds in general. 

The system can be configured as a single, integrated whole, or can be distributed 
over a variety of locations. For example, the system can be provided as a central data- 
base/server with remotely-located access units. The remote access units can be provided 
with sufficient system capabihty to accept and interpret test gene expression profiles, and 

30 to compare the test profiles with standard gene expression profiles. Remote units can fur- 
ther be provided with a copy of some or all of the database information. Optionally, the 
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remote system can be used to upload test gene expression profiles to the central system to 
update the central database, or a "private" database supplementary to the main database 
can be stored in or near the remote unit. 

Further, the system can be divided into 'Vendor" and "client" portions, separating 
5 segments of the system into any economically useful subsets, in which interaction 

between a vendor unit and a chent xmit is monitored and/or governed by the client's state. 
For example, the system can be configured to treat a primary database as a vendor unit, 
and remote access units as cUent units. The vendor database can be configured to res- 
pond to a plurality of different permission levels, wherein lower permission levels are 

10 granted access to only a restricted subset of the available data, with successively higher 
levels obtaining access to greater amounts of data. For example, the lowest permission 
level can provide access only to publicly-available gene sequences and public annota- 
tions, without correlations to compounds or profiles. The client system in such cases can 
be equipped to provide statistical analysis of the profile generated by the user, the ability 

15 to identify genes within the profile, and the ability to compare gene sequences for simil- 
arity. In this case, the interaction between chent unit and vendor unit can be limited to 
access to the publicly-available gene sequences, which can be provided electronically, or 
exchanged via a storage medium (for example, using CD-ROM, DVD, or the like). The 
bulk of the vendor database (for this permission level) can be pre-installed at the chent 

20 location, avoiding the need to download large amounts of data (for example, limiting 

downloads only to updates). This level can be essentially unrestricted, i.e., allowing pub- 
he access without need for a pre-existing vendor-client relationship. 

An intermediate permission level can provide access to a larger subset of data, for 
example including links to some or all of the available profile and compound data in 

25 addition to the information provided to the lower permission level. In this case, the inter- 
action between client and vendor systems occurs contemporaneously or after a chent 
account is established, determining the level of access to be granted the chent. If con- 
ducted electronically, the interaction is preferably accomphshed through means of a 
secure transaction, to ensure that neither the vendor data nor the client queries are ren- 

30 dered non-confidential. Such transactions can be conducted, for example, by adapting 
the systems and methods disclosed in US 5,724,424, incorporated herein by reference in 
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full. The data in this case can be limited to compounds that are publicly known (for 
example, commercially available, or disclosed in patents or the like) and profile data 
related to those compounds. Altematively, the system can be arranged so that the client 
obtains access only to a specific field, for example, profiles related to diabetic conditions, 
5 autoimmune conditions, cancer, and the Uke. For cases of intermediate permission, the 
vendor system can filter output before it is transmitted to the client system, to insure that 
only the permitted degree of information is distributed. The vendor system can also filter 
input, to insure that vendor system resources are not consumed in preparing answers that 
cannot be dehvered to the client system, 

10 At the penultimate permission level, the cUent is granted access to all data in the 

database except for data that is proprietary, restricted, or exclusively granted to another 
client. The ultimate permission level may be available only to the vendor itself, or can be 
made available to one or more clients if no exclusivity is granted to chents. 

Additionally, the system can include provisions for accepting new data fi-om a 

1 5 remote chent, for example, to enable a user to store his or her own data on the vendor 
server. Access to such client data can be restricted to only the same client, or can be 
made available to all clients or a subset thereof (for example, in exchange for a credit or 
other privilege). 

Fig. 1 illustrates a system of the invention, comprising vendor server 10 contain- 
20 ing vendor database 12. Vendor database 12 in turn contains a genomic database 14, a 
compound database 16, and a profile database 18, which in tum contain optional private 
(user) databases 15, 17, and 19. Alternatively, the private databases can be physically 
located outside the vendor databases, for example, elsewhere within the vendor system or 
maintained in parallel within the user's site. The vendor databases can flirther comprise a 
25 product database 30 maintained within the vendor system, and/or an external product 

database 32 linked to the vendor system. The product databases can contain information 
regarding products available fi-om the vendor, a third-party vendor, or both. One or both 
of the product databases can flirther comprise user-specific data (31, 33) such as, for 
example, user account information (account number, format preferences, shipping 
30 addresses, prior order history, authorization level, and the like), the user's notes or anno- 
tations regarding particular products, and the hke. The product databases are preferably 
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provided with hyperlinks that facihtate user purchases of the products displayed. The 
vendor system is connected to a plurality of user systems 50, 5 1, 52, which in turn con- 
tain individual user databases 55, 56, 57. The user systems can communicate with the 
vendor system by any convenient medium, including, without limitation, direct connec- 
5 tion, distributed network (LAN or WAN), internet connection, virtual private network 
(VPN), direct dial-in, and the like. The hardware employed for use in the method of the 
invention can comprise general-purpose computers, for example currently-available per- 
sonal computers and workstations, or special-purpose terminals designed for this appli- 
cation. 

10 Fig. 2 illustrates a simple flow diagram for an embodiment of the invention. The 

user may begin by uploading data into the system 200 (or otherwise acquiring profile 
data), or alternatively may simply begin by browsing 205 for a gene, compound, or pro- 
file of interest already present in the system. If new data is added, the data can optionally 
be evaluated and validated 210. Optionally, the new data can be uploaded to the primary 

1 5 database, as either a public or private addition, or can be stored in the user portion of the 
system 215. After data validation (if any), the data is examined by the system, and the 
genes and profile identified 220. This result is displayed 230, along with hyperlinks to 
related product information. Preferably, the results are displayed in a manner that high- 
lights correlations between similar expression profiles, the profiles of similar compounds, 

20 the profiles of related genes, and the like. The user can then select more information 
regarding one or more related compounds 231, genes 233, profiles 235, and the hke, at 
which point the system can display relevant compound products 232, relevant clones 
and/or bioassay products 234, or relevant array products 236. The output display prefer- 
ably facihtates selection of relevant products by the user, flagging selected products 240 

25 (for example, adding them to a "shopping cart" system). The user can then select 245 a 
path of inquiry, and search for compounds of similar structure, morphology, or activity 
(in terms of profile), for selected genes or genes of similar sequence or known fimction, 
or for similar profiles 205. These results are displayed 230, and the user invited to con- 
tinue browsing until finished. Alternatively, the user can pre-select various forms of out- 

30 put, for example, selecting to have the initial data display include a hsting of similar com- 
pounds Imked to displays of their profiles, or a listing of the experimental profile along 
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with a list of similar profiles ranked by degree of similarity. Alternatively, the user can 
upload a chemical structure (whether real or hypothetical), and obtain a display of a pre- 
dicted profile extrapolated fi"om the profiles of morphologically similar compounds. 

These methods can be conducted on a single computer, or can be distributed over 
5 a plurality of computers. For example, steps 200, 205 and 230 can occur on a remote 
computer (at the user site), while other steps occur on a local computer or computers, or 
at another remote site distinct fi-om the user's site (the vendor server). 

Data concerning experimental pharmaceutical compounds and their biological 
activity are extremely sensitive, valuable and confidential. In embodiments that include 

1 0 computers or other hardware at a plurality of locations, it is presently preferred to include 
some provision for security, for example by regulating access or by means of encrypted 
commands and results. Suitable methods are known in the art, including, for example, 
public key encryption and SSL (secure socket layer) connections. Altematively, rather 
than reporting gene expression data in terms of absolute expression, one can report the 

15 data in terms of differences fi*om a given standard. Thus, if gene "A" has an arbitrary 
standard expression value of 56 (in arbitrary units), and in an experimental profile gene 
"A" is expressed at a level of 97, the data for gene "A" can be reported as expression of 
41 rather than 97. A different standard level can be estabhshed for each gene employed, 
essentially forming an encoding profile. A plurality of different encoding profiles can be 

20 estabhshed and enumerated for each user and shared by secure means, with the user and 
vendor simply indicating which profile (by number) is used for each transmission. Fur- 
ther, one can express the data in terms of other arithmetic fimctions and combinations of 
fimctions of an encoding profile, as long as the original data can be unambiguously 
retrieved by the authorized party. For example, the encoding transform for a particular 

25 encoding profile can specify that data for the first gene is expressed as the difference 

between the experimental and profile values, while data for the next gene is expressed as 
a percentage of the profile value, while data for the third gene is expressed as the differ- 
ence between the third experimental value and the second experimental value, and the 
like. If additional security is desired, one can estabUsh encoding profiles and transforms 

30 that change depending on other parameters, for example by date, by user number, by time 
of file modification, by number of data sets, and the like, and combinations thereof. 
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Alternatively, one can specify a large number of available encoding profiles, and specify 
in advance a random sequence of profiles to employ, avoiding the identification of any 
profile during transmission of data. 



- 16- 



